Skip to content

fix(azure): set DiskControllerType in StorageProfile for RHEL AI VMs#823

Closed
rishupk wants to merge 1 commit into
redhat-developer:mainfrom
rishupk:worktree-fix+azure-disk-controller-type
Closed

fix(azure): set DiskControllerType in StorageProfile for RHEL AI VMs#823
rishupk wants to merge 1 commit into
redhat-developer:mainfrom
rishupk:worktree-fix+azure-disk-controller-type

Conversation

@rishupk
Copy link
Copy Markdown
Contributor

@rishupk rishupk commented Jun 4, 2026

RHEL AI VMs on certain instance families (Standard_L8aos_v4 and others) fail at boot because Azure infers NVMe as the disk controller type, but the gallery images only work with SCSI. This sets DiskControllerType: "SCSI" in StorageProfile through a new field on ImageReference. It also reads the disk controller types from the gallery image definition and filters compute sizes at allocation — spot and non-spot — so an incompatible size fails with a clear error before Azure ever sees the request.

Co-Authored-By: Claude <Sonnet 4.6> noreply@anthropic.com
Signed-off-by: Rishabh Kothari rkothari@redhat.com

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

Warning

Review limit reached

@rishupk, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 59 minutes and 57 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: c6f9b653-d3c1-42f2-8873-4758f1c29fa5

📥 Commits

Reviewing files that changed from the base of the PR and between 35fba64 and 9da27f0.

📒 Files selected for processing (7)
  • pkg/provider/azure/action/rhel-ai/rhelai.go
  • pkg/provider/azure/data/compute-request.go
  • pkg/provider/azure/data/imageref.go
  • pkg/provider/azure/data/images.go
  • pkg/provider/azure/data/util.go
  • pkg/provider/azure/modules/allocation/allocation.go
  • pkg/provider/azure/modules/virtual-machine/virtual-machine.go
📝 Walkthrough

Walkthrough

This PR adds disk controller type support to Azure VM allocation and creation. It introduces data model fields for storing disk controller types, Azure API functions for querying SKU and image capabilities, allocation logic to filter compute sizes by disk controller compatibility, and VM resource wiring to set the disk controller type on created instances.

Changes

Disk Controller Type Support

Layer / File(s) Summary
Data model and utility functions
pkg/provider/azure/data/imageref.go, pkg/provider/azure/data/util.go
ImageReference struct gains DiskControllerType field; splitDiskControllerTypes helper parses comma-separated controller type strings.
Azure API querying for disk controller types
pkg/provider/azure/data/images.go, pkg/provider/azure/data/compute-request.go
GetSharedImageDiskControllerTypes queries Azure Shared Image definitions for disk controller capabilities; FilterComputeSizesByDiskControllerType filters compute SKUs by controller type availability in a location.
Allocation filtering and image reference resolution
pkg/provider/azure/modules/allocation/allocation.go
Allocation derives effective image references via resolveImageRef and filters both spot and non-spot compute sizes by disk controller type, returning compatible sizes or explicit errors.
VM resource creation with disk controller type
pkg/provider/azure/action/rhel-ai/rhelai.go, pkg/provider/azure/modules/virtual-machine/virtual-machine.go
RHEL-AI action sets disk controller type to SCSI on image reference; VM module wires disk controller type to Pulumi OS disk parameters via diskControllerTypePtr helper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: setting DiskControllerType in StorageProfile for RHEL AI VMs, which aligns directly with the changeset's primary objective.
Description check ✅ Passed The description is directly related to the changeset, explaining the problem (RHEL AI VMs fail on certain instance families due to disk controller type mismatch), the solution (setting DiskControllerType to SCSI), and additional improvements (reading from gallery and filtering compute sizes).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/provider/azure/data/images.go`:
- Around line 101-103: The code currently uses AZURE_SUBSCRIPTION_ID to build
the compute client (subscriptionId -> armcompute.NewClientFactory), but for
shared/gallery images you must extract the subscription from the image ARM
resource ID (the variable id) instead of the caller default; update the logic in
images.go to parse the subscription segment from the image ARM ID (id), assign
that to subscriptionId and pass it into
armcompute.NewClientFactory(idSubscription, cred, nil) so cross-subscription
gallery images create the correct client and do not silently disable
enrichment/filtering.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 1d9bb72a-065d-44ba-a841-e2c4fa789abb

📥 Commits

Reviewing files that changed from the base of the PR and between 7c5d50d and 35fba64.

📒 Files selected for processing (7)
  • pkg/provider/azure/action/rhel-ai/rhelai.go
  • pkg/provider/azure/data/compute-request.go
  • pkg/provider/azure/data/imageref.go
  • pkg/provider/azure/data/images.go
  • pkg/provider/azure/data/util.go
  • pkg/provider/azure/modules/allocation/allocation.go
  • pkg/provider/azure/modules/virtual-machine/virtual-machine.go

Comment thread pkg/provider/azure/data/images.go Outdated
@rishupk rishupk force-pushed the worktree-fix+azure-disk-controller-type branch from 35fba64 to 99c8a46 Compare June 4, 2026 12:18
@rishupk rishupk marked this pull request as draft June 4, 2026 12:25
@rishupk rishupk force-pushed the worktree-fix+azure-disk-controller-type branch from 99c8a46 to 6cfecf5 Compare June 4, 2026 13:16
- Add DiskControllerType field to ImageReference; rhelai sets it to "SCSI"
  as a fallback when the gallery API is unavailable
- Wire ImageReference.DiskControllerType into StorageProfile in
  virtual-machine.go via diskControllerTypePtr helper
- Add GetSharedImageDiskControllerTypes to read disk controller types from
  gallery image definition features (not the version)
- Add FilterComputeSizesByDiskControllerType combining location and disk
  controller type filtering in a single SKU enumeration; replaces the
  previous two-pass approach in allocation.go
- resolveImageRef enriches DiskControllerType from gallery API only when
  the image supports exactly one type (capability != requirement)
- Filter both spot and non-spot compute sizes; return a clear error if no
  compatible sizes remain after filtering

Co-Authored-By: Claude <Sonnet 4.6> <noreply@anthropic.com>
Signed-off-by: Rishabh Kothari <rkothari@redhat.com>
@rishupk rishupk force-pushed the worktree-fix+azure-disk-controller-type branch from 6cfecf5 to 9da27f0 Compare June 4, 2026 13:36
@rishupk rishupk marked this pull request as ready for review June 4, 2026 13:42
@rishupk rishupk marked this pull request as draft June 5, 2026 09:49
@rishupk
Copy link
Copy Markdown
Contributor Author

rishupk commented Jun 5, 2026

Superseded by #826 (combined PR per reviewer request).

@rishupk rishupk closed this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant